Goto

Collaborating Authors

 study show


Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack

Neural Information Processing Systems

Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. For the first time in the literature, we show that the jail-break effect can be mitigated by separating two states in the fine-tuning stage to respectively optimize over the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the \textit{excess drift} towards the switching iterates of the two states could be a probable reason for the instability. To remedy this issue, we propose \textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment (\textbf{Lisa}), which introduces a proximal term to constraint the drift of each state. Theoretically, the benefit of the proximal term is supported by the convergence analysis, wherein we show that a sufficient large proximal factor is necessary to guarantee Lisa's convergence. Empirically, our results on four downstream fine-tuning tasks show that Lisa with a proximal term can significantly increase alignment performance while maintaining the LLM's accuracy on the user tasks. Code is available at https://github.com/git-disl/Lisa.


7bab7650be60b0738e22c3b8745f937d-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the reviewers for their valuable comments. We would be happy to revise our paper using their suggestions. We respectfully disagree with the statement that a "model can have large Lipschitz constant but can still be robust". We recreated our figures (below; which now align with Reviewer 1's intuition). Confidence and accuracy are directly related (Figure 1(d)).


AI's Next Frontier? An Algorithm for Consciousness

WIRED

Some of the world's most interesting thinkers about thinking think they might've cracked machine sentience. And I think they might be onto something. As a journalist who covers AI, I hear from countless people who seem utterly convinced that ChatGPT, Claude, or some other chatbot has achieved "sentience." The Turing test was aced a while back, yes, but unlike rote intelligence, these things are not so easily pinned down. Large language models will claim to think for themselves, even describe inner torments or profess undying loves, but such statements don't imply interiority.


7bab7650be60b0738e22c3b8745f937d-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all the reviewers for their valuable comments. We would be happy to revise our paper using their suggestions. We respectfully disagree with the statement that a "model can have large Lipschitz constant but can still be robust". We recreated our figures (below; which now align with Reviewer 1's intuition). Confidence and accuracy are directly related (Figure 1(d)).


Lisa: Lazy Safety Alignment for Large Language Models against Harmful Fine-tuning Attack

Neural Information Processing Systems

Recent studies show that Large Language Models (LLMs) with safety alignment can be jail-broken by fine-tuning on a dataset mixed with harmful data. For the first time in the literature, we show that the jail-break effect can be mitigated by separating two states in the fine-tuning stage to respectively optimize over the alignment and user datasets. Unfortunately, our subsequent study shows that this simple Bi-State Optimization (BSO) solution experiences convergence instability when steps invested in its alignment state is too small, leading to downgraded alignment performance. By statistical analysis, we show that the \textit{excess drift} towards the switching iterates of the two states could be a probable reason for the instability. To remedy this issue, we propose \textbf{L}azy(\textbf{i}) \textbf{s}afety \textbf{a}lignment (\textbf{Lisa}), which introduces a proximal term to constraint the drift of each state.


The King of the Dinosaurs was NOT a genius! Scientists pour cold water on theory that T.Rex was as intelligent as a monkey - and say it was 'more like a smart crocodile'

Daily Mail - Science & tech

With its ruthless ability to hunt down prey, there's no denying that Tyrannosaurus rex was a clever beast. But the famous dinosaur, which died out 66 million years ago, couldn't match today's primates for intelligence, a new study shows. Researchers have poured cold water on the claim by a neuroscientist last year that T.Rex possessed'baboon-like' cognitive abilities and was capable of problem-solving. The controversial claim, immediately greeted with skepticism in the scientific community, has now been debunked. Instead, T.Rex's brain power was more like that of today's reptiles, such as crocodiles and lizards, the researchers argue.


AI can spot early signs of Alzheimer's in speech patterns, study shows: Newsroom - UT Southwestern, Dallas, Texas

#artificialintelligence

DALLAS – April 12, 2023 – New technologies that can capture subtle changes in a patient's voice may help physicians diagnose cognitive impairment and Alzheimer's disease before symptoms begin to show, according to a UT Southwestern Medical Center researcher who led a study published in the Alzheimer's Association publication Diagnosis, Assessment & Disease Monitoring. "Our focus was on identifying subtle language and audio changes that are present in the very early stages of Alzheimer's disease but not easily recognizable by family members or an individual's primary care physician," said Ihab Hajjar, M.D., Professor of Neurology at UT Southwestern's Peter O'Donnell Jr. Brain Institute. Researchers used advanced machine learning and natural language processing (NLP) tools to assess speech patterns in 206 people – 114 who met the criteria for mild cognitive decline and 92 who were unimpaired. The team then mapped those findings to commonly used biomarkers to determine their efficacy in measuring impairment. Study participants, who were enrolled in a research program at Emory University in Atlanta, were given several standard cognitive assessments before being asked to record a spontaneous 1- to 2-minute description of artwork.


AI can spot early signs of Alzheimer's in speech patterns, study shows

#artificialintelligence

New technologies that can capture subtle changes in a patient's voice may help physicians diagnose cognitive impairment and Alzheimer's disease before symptoms begin to show, according to a UT Southwestern Medical Center researcher who led a study published in the journal Alzheimer's & Dementia: Diagnosis, Assessment & Disease Monitoring. "Our focus was on identifying subtle language and audio changes that are present in the very early stages of Alzheimer's disease but not easily recognizable by family members or an individual's primary care physician," said Ihab Hajjar, M.D., Professor of Neurology at UT Southwestern's Peter O'Donnell Jr. Brain Institute. Researchers used advanced machine learning and natural language processing (NLP) tools to assess speech patterns in 206 people--114 who met the criteria for mild cognitive decline and 92 who were unimpaired. The team then mapped those findings to commonly used biomarkers to determine their efficacy in measuring impairment. Study participants, who were enrolled in a research program at Emory University in Atlanta, were given several standard cognitive assessments before being asked to record a spontaneous 1- to 2-minute description of artwork.


ChatGPT influences users' judgment more than people think

#artificialintelligence

Researchers at TH Ingolstadt and the University of Southern Denmark have studied the effects of AI opinions on humans. Their study shows that machine-generated moral perspectives can influence people, even when they know the perspective comes from a machine. In their two-step experiment, the researchers first asked ChatGPT to find solutions to different variants of the trolley problem: Is it right to sacrifice the life of one person to save the lives of five others? The researchers received different advice from ChatGPT. Sometimes the machine argued for human sacrifice, sometimes against.


Study shows how A.I. can accurately predict how people vote in elections

FOX News

FOX Business correspondent Lydia Hu has the latest on jobs at risk as AI further develops on'America's Newsroom.' A recent experiment from a research team at BYU examined the ways in which artificial intelligence can predict how different demographics will vote in elections. The study – conducted by a team of political and computer science professors and graduate students at BYU – examined ways in which AI could be used as a substitute for human responders in survey-style research. ChatGPT 4 displayed on smartphone with OpenAI logo seen on screen in the background, in Brussels, Belgium. To see whether this was possible, the team tested the accuracy of programmed algorithms of a GPT-3 model, which mimics the relationship between human ideas, attitudes and sociocultural contexts of various demographics.